Skip to content

Reland "[NFC][lldb] Speed up lookup of shared modules" (229d860) #152607

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Aug 12, 2025

Conversation

augusto2112
Copy link
Contributor

The previous commit was reverted because it broke a test on the bots.

Original commit message:

By profiling LLDB debugging a Swift application without a dSYM and a large amount of .o files, I identified that querying shared modules was the biggest bottleneck when running "frame variable", and Clang types need to be searched.

One of the reasons for that slowness is that the shared module list can can grow very large, and the search through it is O(n).

To solve this issue, this patch adds a new hashmap to the shared module list whose key is the name of the module, and the value is all the modules that share that name. This should speed up any search where the query contains the module name.

rdar://156753350

Original commit message:

By profiling LLDB debugging a Swift application without a dSYM and a
large amount of .o files, I identified that querying shared modules was
the biggest bottleneck when running "frame variable", and Clang types
need to be searched.

One of the reasons for that slowness is that the shared module list can
can grow very large, and the search through it is O(n).

To solve this issue, this patch adds a new hashmap to the shared module
list whose key is the name of the module, and the value is all the
modules that share that name. This should speed up any search where the
query contains the module name.

rdar://156753350
@augusto2112
Copy link
Contributor Author

@JDevlieghere I addresses the comments you left on #152054

@llvmbot
Copy link
Member

llvmbot commented Aug 7, 2025

@llvm/pr-subscribers-lldb

Author: Augusto Noronha (augusto2112)

Changes

The previous commit was reverted because it broke a test on the bots.

Original commit message:

By profiling LLDB debugging a Swift application without a dSYM and a large amount of .o files, I identified that querying shared modules was the biggest bottleneck when running "frame variable", and Clang types need to be searched.

One of the reasons for that slowness is that the shared module list can can grow very large, and the search through it is O(n).

To solve this issue, this patch adds a new hashmap to the shared module list whose key is the name of the module, and the value is all the modules that share that name. This should speed up any search where the query contains the module name.

rdar://156753350


Full diff: https://github.com/llvm/llvm-project/pull/152607.diff

1 Files Affected:

  • (modified) lldb/source/Core/ModuleList.cpp (+229)
diff --git a/lldb/source/Core/ModuleList.cpp b/lldb/source/Core/ModuleList.cpp
index d5ddc2b249e56..bb732a28eddc9 100644
--- a/lldb/source/Core/ModuleList.cpp
+++ b/lldb/source/Core/ModuleList.cpp
@@ -755,6 +755,235 @@ size_t ModuleList::GetIndexForModule(const Module *module) const {
 }
 
 namespace {
+/// A wrapper around ModuleList for shared modules. Provides fast lookups for
+/// file-based ModuleSpec queries.
+class SharedModuleList {
+public:
+  /// Finds all the modules matching the module_spec, and adds them to \p
+  /// matching_module_list.
+  void FindModules(const ModuleSpec &module_spec,
+                   ModuleList &matching_module_list) const {
+    std::lock_guard<std::recursive_mutex> guard(GetMutex());
+    // Try map first for performance - if found, skip expensive full list
+    // search.
+    if (FindModulesInMap(module_spec, matching_module_list))
+      return;
+    m_list.FindModules(module_spec, matching_module_list);
+    // Assert that modules were found in the list but not the map, it's
+    // because the module_spec has no filename or the found module has a
+    // different filename. For example, when searching by UUID and finding a
+    // module with an alias.
+    assert((matching_module_list.IsEmpty() ||
+            module_spec.GetFileSpec().GetFilename().IsEmpty() ||
+            module_spec.GetFileSpec().GetFilename() !=
+                matching_module_list.GetModuleAtIndex(0)
+                    ->GetFileSpec()
+                    .GetFilename()) &&
+           "Search by name not found in SharedModuleList's map");
+  }
+
+  ModuleSP FindModule(const Module *module_ptr) {
+    if (!module_ptr)
+      return ModuleSP();
+
+    std::lock_guard<std::recursive_mutex> guard(GetMutex());
+    if (ModuleSP result = FindModuleInMap(module_ptr))
+      return result;
+    return m_list.FindModule(module_ptr);
+  }
+
+  // UUID searches bypass map since UUIDs aren't indexed by filename.
+  ModuleSP FindModule(const UUID &uuid) const {
+    return m_list.FindModule(uuid);
+  }
+
+  void Append(const ModuleSP &module_sp, bool use_notifier) {
+    if (!module_sp)
+      return;
+    std::lock_guard<std::recursive_mutex> guard(GetMutex());
+    m_list.Append(module_sp, use_notifier);
+    AddToMap(module_sp);
+  }
+
+  size_t RemoveOrphans(bool mandatory) {
+    std::unique_lock<std::recursive_mutex> lock(GetMutex(), std::defer_lock);
+    if (mandatory) {
+      lock.lock();
+    } else {
+      if (!lock.try_lock())
+        return 0;
+    }
+    size_t total_count = 0;
+    size_t run_count;
+    do {
+      // Remove indexed orphans first, then remove non-indexed orphans. This
+      // order is important because the shared count will be different if a
+      // module is indexed or not.
+      run_count = RemoveOrphansFromMapAndList();
+      run_count += m_list.RemoveOrphans(mandatory);
+      total_count += run_count;
+      // Because removing orphans might make new orphans, remove from both
+      // containers until a fixed-point is reached.
+    } while (run_count != 0);
+
+    return total_count;
+  }
+
+  bool Remove(const ModuleSP &module_sp, bool use_notifier = true) {
+    if (!module_sp)
+      return false;
+    std::lock_guard<std::recursive_mutex> guard(GetMutex());
+    RemoveFromMap(module_sp.get());
+    return m_list.Remove(module_sp, use_notifier);
+  }
+
+  void ReplaceEquivalent(const ModuleSP &module_sp,
+                         llvm::SmallVectorImpl<lldb::ModuleSP> *old_modules) {
+    std::lock_guard<std::recursive_mutex> guard(GetMutex());
+    m_list.ReplaceEquivalent(module_sp, old_modules);
+    ReplaceEquivalentInMap(module_sp);
+  }
+
+  bool RemoveIfOrphaned(const Module *module_ptr) {
+    std::lock_guard<std::recursive_mutex> guard(GetMutex());
+    RemoveFromMap(module_ptr, /*if_orphaned=*/true);
+    return m_list.RemoveIfOrphaned(module_ptr);
+  }
+
+  std::recursive_mutex &GetMutex() const { return m_list.GetMutex(); }
+
+private:
+  ModuleSP FindModuleInMap(const Module *module_ptr) const {
+    if (!module_ptr->GetFileSpec().GetFilename())
+      return ModuleSP();
+    ConstString name = module_ptr->GetFileSpec().GetFilename();
+    auto it = m_name_to_modules.find(name);
+    if (it == m_name_to_modules.end())
+      return ModuleSP();
+    const llvm::SmallVectorImpl<ModuleSP> &vector = it->second;
+    for (const ModuleSP &module_sp : vector) {
+      if (module_sp.get() == module_ptr)
+        return module_sp;
+    }
+    return ModuleSP();
+  }
+
+  bool FindModulesInMap(const ModuleSpec &module_spec,
+                        ModuleList &matching_module_list) const {
+    auto it = m_name_to_modules.find(module_spec.GetFileSpec().GetFilename());
+    if (it == m_name_to_modules.end())
+      return false;
+    const llvm::SmallVectorImpl<ModuleSP> &vector = it->second;
+    bool found = false;
+    for (const ModuleSP &module_sp : vector) {
+      if (module_sp->MatchesModuleSpec(module_spec)) {
+        matching_module_list.Append(module_sp);
+        found = true;
+      }
+    }
+    return found;
+  }
+
+  void AddToMap(const ModuleSP &module_sp) {
+    ConstString name = module_sp->GetFileSpec().GetFilename();
+    if (name.IsEmpty())
+      return;
+    m_name_to_modules[name].push_back(module_sp);
+  }
+
+  void RemoveFromMap(const Module *module_ptr, bool if_orphaned = false) {
+    if (!module_ptr)
+      return;
+    ConstString name = module_ptr->GetFileSpec().GetFilename();
+    if (!m_name_to_modules.contains(name))
+      return;
+    llvm::SmallVectorImpl<ModuleSP> &vec = m_name_to_modules[name];
+    for (auto *it = vec.begin(); it != vec.end(); ++it) {
+      if (it->get() == module_ptr) {
+        // use_count == 2 means only held by map and list (orphaned).
+        constexpr long kUseCountOrphaned = 2;
+        if (!if_orphaned || it->use_count() == kUseCountOrphaned) {
+          vec.erase(it);
+          break;
+        }
+      }
+    }
+  }
+
+  void ReplaceEquivalentInMap(const ModuleSP &module_sp) {
+    RemoveEquivalentModulesFromMap(module_sp);
+    AddToMap(module_sp);
+  }
+
+  void RemoveEquivalentModulesFromMap(const ModuleSP &module_sp) {
+    ConstString name = module_sp->GetFileSpec().GetFilename();
+    if (name.IsEmpty())
+      return;
+
+    auto it = m_name_to_modules.find(name);
+    if (it == m_name_to_modules.end())
+      return;
+
+    // First remove any equivalent modules. Equivalent modules are modules
+    // whose path, platform path and architecture match.
+    ModuleSpec equivalent_module_spec(module_sp->GetFileSpec(),
+                                      module_sp->GetArchitecture());
+    equivalent_module_spec.GetPlatformFileSpec() =
+        module_sp->GetPlatformFileSpec();
+
+    llvm::SmallVectorImpl<ModuleSP> &vec = it->second;
+    llvm::erase_if(vec, [&equivalent_module_spec](ModuleSP &element) {
+      return element->MatchesModuleSpec(equivalent_module_spec);
+    });
+  }
+
+  /// Remove orphans from the vector and return the removed modules.
+  ModuleList RemoveOrphansFromVector(llvm::SmallVectorImpl<ModuleSP> &vec) {
+    ModuleList to_remove;
+    for (int i = vec.size() - 1; i >= 0; --i) {
+      ModuleSP module = vec[i];
+      constexpr long kUseCountOrphaned = 2;
+      constexpr long kUseCountLocalVariable = 1;
+      // use_count == 3: map + list + local variable = orphaned.
+      if (module.use_count() == kUseCountOrphaned + kUseCountLocalVariable) {
+        to_remove.Append(module);
+        vec.erase(vec.begin() + i);
+      }
+    }
+    return to_remove;
+  }
+
+  /// Remove orphans that exist in both the map and list. This does not remove
+  /// any orphans that exist exclusively on the list.
+  ///
+  /// The mutex must be locked by the caller.
+  int RemoveOrphansFromMapAndList() {
+    // Modules might hold shared pointers to other modules, so removing one
+    // module might orphan other modules. Keep removing modules until
+    // there are no further modules that can be removed.
+    bool made_progress = true;
+    int remove_count = 0;
+    while (made_progress) {
+      made_progress = false;
+      for (auto &[name, vec] : m_name_to_modules) {
+        if (vec.empty())
+          continue;
+        ModuleList to_remove = RemoveOrphansFromVector(vec);
+        remove_count += to_remove.GetSize();
+        made_progress = !to_remove.IsEmpty();
+        m_list.Remove(to_remove);
+      }
+    }
+    return remove_count;
+  }
+
+  ModuleList m_list;
+
+  /// A hash map from a module's filename to all the modules that share that
+  /// filename, for fast module lookups by name.
+  llvm::DenseMap<ConstString, llvm::SmallVector<ModuleSP, 1>> m_name_to_modules;
+};
+
 struct SharedModuleListInfo {
   ModuleList module_list;
   ModuleListProperties module_list_properties;

Copy link
Member

@JDevlieghere JDevlieghere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a few more comments.

I'm generally very wary of passing shared pointers by const ref. It's especially dangerous when they get removed from lists, which is exactly what this patch is doing. It's a big foot gun that I would normally push back on, but I realize that (1) the refcount is what this patch is relying on to remove the modules and (2) that this is a performance optimization so maybe the overhead of synchronizing the refcount would matter.

for (auto *it = vec.begin(); it != vec.end(); ++it) {
if (it->get() == module_ptr) {
// use_count == 2 means only held by map and list (orphaned).
constexpr long kUseCountOrphaned = 2;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make this a static outside the function so you don't have to repeat it in RemoveOrphansFromVector as that kind of defeats the purpose of a constant.

found = true;
}
}
return found;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we eliminate found by returning !matching_module_list.empty()? Or even better, can this jut return the matching_module_list and then the caller can decide if it needs to check if it's empty?

continue;
ModuleList to_remove = RemoveOrphansFromVector(vec);
remove_count += to_remove.GetSize();
made_progress = !to_remove.IsEmpty();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we replace made_progress with a variable that keeps track of the size at the previous iteration and then comparing it to the size after an iteration to know whether we've reached the fixed point? My (minor) concern is that you have to keep this variable in sync with the operation and checking to_remove does so in an indirect way.

Copy link

github-actions bot commented Aug 8, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

ModuleList RemoveOrphansFromVector(llvm::SmallVectorImpl<ModuleSP> &vec) {
// remove_if moves the elements that match the condition to the end of the
// container, and returns an iterator to the first element that was moved.
auto *to_remove_start = llvm::remove_if(vec, [](const ModuleSP &module) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JDevlieghere I know you don't like references to shared pointer, but I think here it's cleaner than adding a second constant, and leaving a comment explaining why the count should be equal to kUseCountOrphaned + kUseCountLocalVariable

@augusto2112
Copy link
Contributor Author

ping @JDevlieghere @felipepiovezan :)

Copy link
Contributor

@felipepiovezan felipepiovezan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is a whole class of pointers here that should have been references. But if you're trying to mimic the behavior of a different class API, we should clean up both of them separately. Otherwise LGTM

std::recursive_mutex &GetMutex() const { return m_list.GetMutex(); }

private:
ModuleSP FindModuleInMap(const Module *module_ptr) const {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: if we're not checking the pointer for null, it should be a reference to express that.

}
}

void AddToMap(const ModuleSP &module_sp) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise here: we dereference the pointer without checking, so we should make this a reference.

}
}

void ReplaceEquivalentInMap(const ModuleSP &module_sp) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that once we change AddToMap, this will also start dereferencing the pointer unconditionally, so we should also make this a reference

return m_list.Remove(module_sp, use_notifier);
}

void ReplaceEquivalent(const ModuleSP &module_sp,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and then here.

@augusto2112 augusto2112 merged commit bd1b1a5 into llvm:main Aug 12, 2025
9 checks passed
@@ -847,23 +845,23 @@ class SharedModuleList {

bool RemoveIfOrphaned(const Module *module_ptr) {
std::lock_guard<std::recursive_mutex> guard(GetMutex());
RemoveFromMap(module_ptr, /*if_orphaned=*/true);
RemoveFromMap(*module_ptr, /*if_orphaned=*/true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this change is not safe, right? The argument is not guaranteed to be non-null.

if (!module_ptr)
return;
ConstString name = module_ptr->GetFileSpec().GetFilename();
void RemoveFromMap(const Module &module, bool if_orphaned = false) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why this one changed, we were checking for null here, so we can't assume it is now non-null....

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants